Overview

Brought to you by YData

Dataset statistics

Number of variables20
Number of observations10000
Missing cells8070
Missing cells (%)4.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.6 MiB
Average record size in memory168.0 B

Variable types

Categorical7
DateTime2
Numeric10
Boolean1

Alerts

Airport_fee is highly overall correlated with fare_amount and 3 other fieldsHigh correlation
RatecodeID is highly overall correlated with improvement_surchargeHigh correlation
VendorID is highly overall correlated with extraHigh correlation
congestion_surcharge is highly overall correlated with improvement_surcharge and 2 other fieldsHigh correlation
extra is highly overall correlated with VendorIDHigh correlation
fare_amount is highly overall correlated with Airport_fee and 2 other fieldsHigh correlation
improvement_surcharge is highly overall correlated with RatecodeID and 2 other fieldsHigh correlation
mta_tax is highly overall correlated with congestion_surcharge and 2 other fieldsHigh correlation
store_and_fwd_flag is highly overall correlated with trip_distanceHigh correlation
tip_amount is highly overall correlated with total_amountHigh correlation
tolls_amount is highly overall correlated with Airport_feeHigh correlation
total_amount is highly overall correlated with Airport_fee and 4 other fieldsHigh correlation
trip_distance is highly overall correlated with Airport_fee and 4 other fieldsHigh correlation
VendorID is highly imbalanced (51.9%) Imbalance
store_and_fwd_flag is highly imbalanced (97.6%) Imbalance
mta_tax is highly imbalanced (87.9%) Imbalance
improvement_surcharge is highly imbalanced (86.9%) Imbalance
congestion_surcharge is highly imbalanced (68.2%) Imbalance
Airport_fee is highly imbalanced (73.3%) Imbalance
passenger_count has 1614 (16.1%) missing values Missing
RatecodeID has 1614 (16.1%) missing values Missing
store_and_fwd_flag has 1614 (16.1%) missing values Missing
congestion_surcharge has 1614 (16.1%) missing values Missing
Airport_fee has 1614 (16.1%) missing values Missing
trip_distance is highly skewed (γ1 = 99.99766163) Skewed
trip_distance has 243 (2.4%) zeros Zeros
extra has 5071 (50.7%) zeros Zeros
tip_amount has 3268 (32.7%) zeros Zeros
tolls_amount has 9374 (93.7%) zeros Zeros

Reproduction

Analysis started2025-06-04 06:58:51.563815
Analysis finished2025-06-04 06:59:47.909420
Duration56.35 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

VendorID
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2
7814 
1
2182 
7
 
4

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters10000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
2 7814
78.1%
1 2182
 
21.8%
7 4
 
< 0.1%

Length

2025-06-04T12:29:48.266637image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-06-04T12:29:48.776122image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
2 7814
78.1%
1 2182
 
21.8%
7 4
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
2 7814
78.1%
1 2182
 
21.8%
7 4
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 7814
78.1%
1 2182
 
21.8%
7 4
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 10000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 7814
78.1%
1 2182
 
21.8%
7 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 7814
78.1%
1 2182
 
21.8%
7 4
 
< 0.1%
Distinct9979
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2025-01-01 00:02:23
Maximum2025-01-31 23:59:50
Invalid dates0
Invalid dates (%)0.0%
2025-06-04T12:29:49.388965image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:50.143894image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct9979
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2025-01-01 00:13:19
Maximum2025-02-01 00:24:05
Invalid dates0
Invalid dates (%)0.0%
2025-06-04T12:29:50.981022image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:51.725550image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

passenger_count
Real number (ℝ)

Missing 

Distinct7
Distinct (%)0.1%
Missing1614
Missing (%)16.1%
Infinite0
Infinite (%)0.0%
Mean1.3037205
Minimum0
Maximum6
Zeros77
Zeros (%)0.8%
Negative0
Negative (%)0.0%
Memory size156.2 KiB
2025-06-04T12:29:52.246060image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.76798056
Coefficient of variation (CV)0.58906842
Kurtosis10.786931
Mean1.3037205
Median Absolute Deviation (MAD)0
Skewness3.0059831
Sum10933
Variance0.58979415
MonotonicityNot monotonic
2025-06-04T12:29:52.823980image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1 6636
66.4%
2 1116
 
11.2%
3 290
 
2.9%
4 177
 
1.8%
0 77
 
0.8%
5 53
 
0.5%
6 37
 
0.4%
(Missing) 1614
 
16.1%
ValueCountFrequency (%)
0 77
 
0.8%
1 6636
66.4%
2 1116
 
11.2%
3 290
 
2.9%
4 177
 
1.8%
5 53
 
0.5%
6 37
 
0.4%
ValueCountFrequency (%)
6 37
 
0.4%
5 53
 
0.5%
4 177
 
1.8%
3 290
 
2.9%
2 1116
 
11.2%
1 6636
66.4%
0 77
 
0.8%

trip_distance
Real number (ℝ)

High correlation  Skewed  Zeros 

Distinct1338
Distinct (%)13.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.537008
Minimum0
Maximum104448.07
Zeros243
Zeros (%)2.4%
Negative0
Negative (%)0.0%
Memory size156.2 KiB
2025-06-04T12:29:53.627719image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.4
Q10.98
median1.66
Q33.09
95-th percentile12
Maximum104448.07
Range104448.07
Interquartile range (IQR)2.11

Descriptive statistics

Standard deviation1044.4579
Coefficient of variation (CV)77.155743
Kurtosis9999.6882
Mean13.537008
Median Absolute Deviation (MAD)0.85
Skewness99.997662
Sum135370.08
Variance1090892.3
MonotonicityNot monotonic
2025-06-04T12:29:54.510494image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 243
 
2.4%
1.2 134
 
1.3%
1 132
 
1.3%
1.1 131
 
1.3%
0.6 122
 
1.2%
0.8 122
 
1.2%
0.7 117
 
1.2%
1.3 115
 
1.1%
0.9 112
 
1.1%
1.5 107
 
1.1%
Other values (1328) 8665
86.7%
ValueCountFrequency (%)
0 243
2.4%
0.01 36
 
0.4%
0.02 9
 
0.1%
0.03 3
 
< 0.1%
0.04 2
 
< 0.1%
0.05 1
 
< 0.1%
0.06 1
 
< 0.1%
0.07 2
 
< 0.1%
0.08 2
 
< 0.1%
0.09 1
 
< 0.1%
ValueCountFrequency (%)
104448.07 1
< 0.1%
53 1
< 0.1%
36.78 1
< 0.1%
36.26 1
< 0.1%
35.4 1
< 0.1%
35.27 1
< 0.1%
35.22 1
< 0.1%
34.74 1
< 0.1%
32.34 1
< 0.1%
32.3 1
< 0.1%

RatecodeID
Real number (ℝ)

High correlation  Missing 

Distinct6
Distinct (%)0.1%
Missing1614
Missing (%)16.1%
Infinite0
Infinite (%)0.0%
Mean2.5149058
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size156.2 KiB
2025-06-04T12:29:55.139230image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum99
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation11.780707
Coefficient of variation (CV)4.6843531
Kurtosis63.04852
Mean2.5149058
Median Absolute Deviation (MAD)0
Skewness8.0587705
Sum21090
Variance138.78505
MonotonicityNot monotonic
2025-06-04T12:29:55.703917image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1 7889
78.9%
2 262
 
2.6%
99 123
 
1.2%
5 68
 
0.7%
4 28
 
0.3%
3 16
 
0.2%
(Missing) 1614
 
16.1%
ValueCountFrequency (%)
1 7889
78.9%
2 262
 
2.6%
3 16
 
0.2%
4 28
 
0.3%
5 68
 
0.7%
99 123
 
1.2%
ValueCountFrequency (%)
99 123
 
1.2%
5 68
 
0.7%
4 28
 
0.3%
3 16
 
0.2%
2 262
 
2.6%
1 7889
78.9%

store_and_fwd_flag
Boolean

High correlation  Imbalance  Missing 

Distinct2
Distinct (%)< 0.1%
Missing1614
Missing (%)16.1%
Memory size97.7 KiB
False
8366 
True
 
20
(Missing)
1614 
ValueCountFrequency (%)
False 8366
83.7%
True 20
 
0.2%
(Missing) 1614
 
16.1%
2025-06-04T12:29:56.114035image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

PULocationID
Real number (ℝ)

Distinct178
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean166.423
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size156.2 KiB
2025-06-04T12:29:56.669052image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile48
Q1132
median162
Q3234
95-th percentile249
Maximum265
Range264
Interquartile range (IQR)102

Descriptive statistics

Standard deviation64.524239
Coefficient of variation (CV)0.38771227
Kurtosis-0.84403529
Mean166.423
Median Absolute Deviation (MAD)64
Skewness-0.29412145
Sum1664230
Variance4163.3774
MonotonicityNot monotonic
2025-06-04T12:29:57.443768image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
161 508
 
5.1%
237 473
 
4.7%
236 451
 
4.5%
132 427
 
4.3%
230 368
 
3.7%
162 343
 
3.4%
186 325
 
3.2%
234 293
 
2.9%
142 286
 
2.9%
170 282
 
2.8%
Other values (168) 6244
62.4%
ValueCountFrequency (%)
1 1
 
< 0.1%
4 24
 
0.2%
6 1
 
< 0.1%
7 8
 
0.1%
10 5
 
0.1%
12 1
 
< 0.1%
13 65
0.7%
14 1
 
< 0.1%
17 3
 
< 0.1%
19 1
 
< 0.1%
ValueCountFrequency (%)
265 2
 
< 0.1%
264 24
 
0.2%
263 216
2.2%
262 163
1.6%
261 57
 
0.6%
260 2
 
< 0.1%
258 3
 
< 0.1%
257 2
 
< 0.1%
256 4
 
< 0.1%
255 7
 
0.1%

DOLocationID
Real number (ℝ)

Distinct207
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean163.579
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size156.2 KiB
2025-06-04T12:29:58.067129image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile43
Q1113
median162
Q3234
95-th percentile257
Maximum265
Range264
Interquartile range (IQR)121

Descriptive statistics

Standard deviation69.584249
Coefficient of variation (CV)0.4253862
Kurtosis-0.94946724
Mean163.579
Median Absolute Deviation (MAD)69
Skewness-0.34845111
Sum1635790
Variance4841.9678
MonotonicityNot monotonic
2025-06-04T12:29:58.743678image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
236 451
 
4.5%
237 433
 
4.3%
161 392
 
3.9%
170 298
 
3.0%
230 293
 
2.9%
239 283
 
2.8%
186 269
 
2.7%
141 268
 
2.7%
48 263
 
2.6%
142 261
 
2.6%
Other values (197) 6789
67.9%
ValueCountFrequency (%)
1 16
 
0.2%
3 2
 
< 0.1%
4 52
0.5%
7 17
 
0.2%
10 8
 
0.1%
11 3
 
< 0.1%
12 3
 
< 0.1%
13 62
0.6%
14 9
 
0.1%
15 1
 
< 0.1%
ValueCountFrequency (%)
265 38
 
0.4%
264 36
 
0.4%
263 195
1.9%
262 168
1.7%
261 50
 
0.5%
260 7
 
0.1%
259 1
 
< 0.1%
258 3
 
< 0.1%
257 6
 
0.1%
256 23
 
0.2%

payment_type
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
6957 
0
1614 
2
1160 
4
 
206
3
 
63

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters10000
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row4
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 6957
69.6%
0 1614
 
16.1%
2 1160
 
11.6%
4 206
 
2.1%
3 63
 
0.6%

Length

2025-06-04T12:29:59.444028image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-06-04T12:29:59.936157image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
1 6957
69.6%
0 1614
 
16.1%
2 1160
 
11.6%
4 206
 
2.1%
3 63
 
0.6%

Most occurring characters

ValueCountFrequency (%)
1 6957
69.6%
0 1614
 
16.1%
2 1160
 
11.6%
4 206
 
2.1%
3 63
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 6957
69.6%
0 1614
 
16.1%
2 1160
 
11.6%
4 206
 
2.1%
3 63
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common 10000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 6957
69.6%
0 1614
 
16.1%
2 1160
 
11.6%
4 206
 
2.1%
3 63
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 6957
69.6%
0 1614
 
16.1%
2 1160
 
11.6%
4 206
 
2.1%
3 63
 
0.6%

fare_amount
Real number (ℝ)

High correlation 

Distinct1300
Distinct (%)13.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16.729521
Minimum-232.6
Maximum250
Zeros1
Zeros (%)< 0.1%
Negative416
Negative (%)4.2%
Memory size156.2 KiB
2025-06-04T12:30:00.637722image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum-232.6
5-th percentile4.4
Q18.6
median12.1
Q319.1625
95-th percentile50.6
Maximum250
Range482.6
Interquartile range (IQR)10.5625

Descriptive statistics

Standard deviation17.556368
Coefficient of variation (CV)1.0494244
Kurtosis19.518952
Mean16.729521
Median Absolute Deviation (MAD)4.9
Skewness2.1104963
Sum167295.21
Variance308.22606
MonotonicityNot monotonic
2025-06-04T12:30:01.241945image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.3 448
 
4.5%
7.9 434
 
4.3%
10 414
 
4.1%
7.2 401
 
4.0%
8.6 387
 
3.9%
10.7 374
 
3.7%
6.5 373
 
3.7%
12.1 364
 
3.6%
11.4 363
 
3.6%
5.8 314
 
3.1%
Other values (1290) 6128
61.3%
ValueCountFrequency (%)
-232.6 1
 
< 0.1%
-89.9 1
 
< 0.1%
-81.8 1
 
< 0.1%
-71.6 1
 
< 0.1%
-70 11
0.1%
-66 1
 
< 0.1%
-62 1
 
< 0.1%
-61.8 1
 
< 0.1%
-58.3 1
 
< 0.1%
-56.2 1
 
< 0.1%
ValueCountFrequency (%)
250 1
< 0.1%
202.5 1
< 0.1%
200 1
< 0.1%
193.4 1
< 0.1%
184.3 1
< 0.1%
162.6 1
< 0.1%
158.4 2
< 0.1%
150 1
< 0.1%
146.5 2
< 0.1%
145.1 1
< 0.1%

extra
Real number (ℝ)

High correlation  Zeros 

Distinct31
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.332225
Minimum-7.5
Maximum12.5
Zeros5071
Zeros (%)50.7%
Negative67
Negative (%)0.7%
Memory size156.2 KiB
2025-06-04T12:30:02.423778image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum-7.5
5-th percentile0
Q10
median0
Q32.5
95-th percentile5
Maximum12.5
Range20
Interquartile range (IQR)2.5

Descriptive statistics

Standard deviation1.8641433
Coefficient of variation (CV)1.3992706
Kurtosis3.0001168
Mean1.332225
Median Absolute Deviation (MAD)0
Skewness1.518012
Sum13322.25
Variance3.4750303
MonotonicityNot monotonic
2025-06-04T12:30:03.041913image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
0 5071
50.7%
1 1550
 
15.5%
2.5 1530
 
15.3%
3.25 593
 
5.9%
4.25 324
 
3.2%
5 308
 
3.1%
5.75 221
 
2.2%
3.5 109
 
1.1%
6 64
 
0.6%
7.5 48
 
0.5%
Other values (21) 182
 
1.8%
ValueCountFrequency (%)
-7.5 1
 
< 0.1%
-6 1
 
< 0.1%
-5 4
 
< 0.1%
-2.5 26
 
0.3%
-1 33
 
0.3%
-0.75 2
 
< 0.1%
0 5071
50.7%
0.75 7
 
0.1%
1 1550
 
15.5%
1.75 9
 
0.1%
ValueCountFrequency (%)
12.5 6
 
0.1%
11.75 1
 
< 0.1%
11 7
 
0.1%
10.75 2
 
< 0.1%
10.25 2
 
< 0.1%
10 19
0.2%
9.25 13
0.1%
8.5 1
 
< 0.1%
8.25 12
0.1%
7.75 9
0.1%

mta_tax
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0.5
9752 
-0.5
 
155
0.0
 
93

Length

Max length4
Median length3
Mean length3.0155
Min length3

Characters and Unicode

Total characters30155
Distinct characters4
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.5
2nd row0.5
3rd row-0.5
4th row0.5
5th row0.5

Common Values

ValueCountFrequency (%)
0.5 9752
97.5%
-0.5 155
 
1.6%
0.0 93
 
0.9%

Length

2025-06-04T12:30:03.917620image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-06-04T12:30:04.349975image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0.5 9907
99.1%
0.0 93
 
0.9%

Most occurring characters

ValueCountFrequency (%)
0 10093
33.5%
. 10000
33.2%
5 9907
32.9%
- 155
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 20000
66.3%
Other Punctuation 10000
33.2%
Dash Punctuation 155
 
0.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 10093
50.5%
5 9907
49.5%
Other Punctuation
ValueCountFrequency (%)
. 10000
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 155
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 30155
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 10093
33.5%
. 10000
33.2%
5 9907
32.9%
- 155
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 30155
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 10093
33.5%
. 10000
33.2%
5 9907
32.9%
- 155
 
0.5%

tip_amount
Real number (ℝ)

High correlation  Zeros 

Distinct915
Distinct (%)9.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.893703
Minimum0
Maximum50.2
Zeros3268
Zeros (%)32.7%
Negative0
Negative (%)0.0%
Memory size156.2 KiB
2025-06-04T12:30:04.904089image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median2.39
Q33.9
95-th percentile10
Maximum50.2
Range50.2
Interquartile range (IQR)3.9

Descriptive statistics

Standard deviation3.5922181
Coefficient of variation (CV)1.2413914
Kurtosis15.06332
Mean2.893703
Median Absolute Deviation (MAD)2.26
Skewness2.9142675
Sum28937.03
Variance12.904031
MonotonicityNot monotonic
2025-06-04T12:30:05.635890image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 3268
32.7%
2 453
 
4.5%
1 362
 
3.6%
3 177
 
1.8%
5 97
 
1.0%
2.95 85
 
0.9%
1.5 77
 
0.8%
4 72
 
0.7%
2.8 64
 
0.6%
3.37 58
 
0.6%
Other values (905) 5287
52.9%
ValueCountFrequency (%)
0 3268
32.7%
0.01 5
 
0.1%
0.02 3
 
< 0.1%
0.03 1
 
< 0.1%
0.05 2
 
< 0.1%
0.08 2
 
< 0.1%
0.1 3
 
< 0.1%
0.11 1
 
< 0.1%
0.45 1
 
< 0.1%
0.49 1
 
< 0.1%
ValueCountFrequency (%)
50.2 1
< 0.1%
43.45 1
< 0.1%
39.3 1
< 0.1%
37.6 1
< 0.1%
35.09 1
< 0.1%
33.37 1
< 0.1%
33 1
< 0.1%
31.68 1
< 0.1%
30.36 1
< 0.1%
30.04 1
< 0.1%

tolls_amount
Real number (ℝ)

High correlation  Zeros 

Distinct36
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.445742
Minimum-27.88
Maximum30.94
Zeros9374
Zeros (%)93.7%
Negative13
Negative (%)0.1%
Memory size156.2 KiB
2025-06-04T12:30:06.189575image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum-27.88
5-th percentile0
Q10
median0
Q30
95-th percentile6.94
Maximum30.94
Range58.82
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.9530841
Coefficient of variation (CV)4.381647
Kurtosis39.374332
Mean0.445742
Median Absolute Deviation (MAD)0
Skewness4.4758295
Sum4457.42
Variance3.8145375
MonotonicityNot monotonic
2025-06-04T12:30:06.775708image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=36)
ValueCountFrequency (%)
0 9374
93.7%
6.94 560
 
5.6%
-6.94 11
 
0.1%
3.18 7
 
0.1%
16.06 4
 
< 0.1%
11.19 4
 
< 0.1%
13.88 4
 
< 0.1%
15.38 4
 
< 0.1%
14.06 3
 
< 0.1%
23 2
 
< 0.1%
Other values (26) 27
 
0.3%
ValueCountFrequency (%)
-27.88 1
 
< 0.1%
-14.06 1
 
< 0.1%
-6.94 11
 
0.1%
0 9374
93.7%
2 1
 
< 0.1%
2.6 2
 
< 0.1%
3.18 7
 
0.1%
4 1
 
< 0.1%
5.2 1
 
< 0.1%
6.94 560
 
5.6%
ValueCountFrequency (%)
30.94 1
< 0.1%
27.94 1
< 0.1%
24.38 1
< 0.1%
24.06 1
< 0.1%
23 2
< 0.1%
22.56 1
< 0.1%
20.94 1
< 0.1%
20.88 1
< 0.1%
20.32 1
< 0.1%
20 1
< 0.1%

improvement_surcharge
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1.0
9726 
-1.0
 
160
0.0
 
114

Length

Max length4
Median length3
Mean length3.016
Min length3

Characters and Unicode

Total characters30160
Distinct characters4
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row-1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 9726
97.3%
-1.0 160
 
1.6%
0.0 114
 
1.1%

Length

2025-06-04T12:30:07.397679image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-06-04T12:30:08.105755image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0 9886
98.9%
0.0 114
 
1.1%

Most occurring characters

ValueCountFrequency (%)
0 10114
33.5%
. 10000
33.2%
1 9886
32.8%
- 160
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 20000
66.3%
Other Punctuation 10000
33.2%
Dash Punctuation 160
 
0.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 10114
50.6%
1 9886
49.4%
Other Punctuation
ValueCountFrequency (%)
. 10000
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 160
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 30160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 10114
33.5%
. 10000
33.2%
1 9886
32.8%
- 160
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 30160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 10114
33.5%
. 10000
33.2%
1 9886
32.8%
- 160
 
0.5%

total_amount
Real number (ℝ)

High correlation 

Distinct2827
Distinct (%)28.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.209691
Minimum-236.85
Maximum301.2
Zeros1
Zeros (%)< 0.1%
Negative166
Negative (%)1.7%
Memory size156.2 KiB
2025-06-04T12:30:08.612773image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum-236.85
5-th percentile9.097
Q115.18
median19.945
Q327.69
95-th percentile72.42
Maximum301.2
Range538.05
Interquartile range (IQR)12.51

Descriptive statistics

Standard deviation21.756773
Coefficient of variation (CV)0.86303211
Kurtosis15.879141
Mean25.209691
Median Absolute Deviation (MAD)5.605
Skewness2.1646193
Sum252096.91
Variance473.35717
MonotonicityNot monotonic
2025-06-04T12:30:09.390046image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17.7 79
 
0.8%
20.22 58
 
0.6%
16.02 56
 
0.6%
21.9 56
 
0.6%
16.86 53
 
0.5%
21.06 52
 
0.5%
16.8 49
 
0.5%
13.5 48
 
0.5%
14.7 45
 
0.4%
18.06 45
 
0.4%
Other values (2817) 9459
94.6%
ValueCountFrequency (%)
-236.85 1
 
< 0.1%
-105.98 1
 
< 0.1%
-90.9 1
 
< 0.1%
-85.55 1
 
< 0.1%
-83.44 2
< 0.1%
-82.69 3
< 0.1%
-80.94 1
 
< 0.1%
-79 1
 
< 0.1%
-77.81 1
 
< 0.1%
-77.69 2
< 0.1%
ValueCountFrequency (%)
301.2 1
< 0.1%
241.5 1
< 0.1%
237.53 1
< 0.1%
217.24 1
< 0.1%
214.71 1
< 0.1%
212.28 1
< 0.1%
210 1
< 0.1%
201.96 1
< 0.1%
190.06 1
< 0.1%
188.09 1
< 0.1%

congestion_surcharge
Categorical

High correlation  Imbalance  Missing 

Distinct3
Distinct (%)< 0.1%
Missing1614
Missing (%)16.1%
Memory size156.2 KiB
2.5
7610 
0.0
 
649
-2.5
 
127

Length

Max length4
Median length3
Mean length3.0151443
Min length3

Characters and Unicode

Total characters25285
Distinct characters5
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row2.5
3rd row-2.5
4th row2.5
5th row2.5

Common Values

ValueCountFrequency (%)
2.5 7610
76.1%
0.0 649
 
6.5%
-2.5 127
 
1.3%
(Missing) 1614
 
16.1%

Length

2025-06-04T12:30:09.942111image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-06-04T12:30:10.478416image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
2.5 7737
92.3%
0.0 649
 
7.7%

Most occurring characters

ValueCountFrequency (%)
. 8386
33.2%
2 7737
30.6%
5 7737
30.6%
0 1298
 
5.1%
- 127
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16772
66.3%
Other Punctuation 8386
33.2%
Dash Punctuation 127
 
0.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 7737
46.1%
5 7737
46.1%
0 1298
 
7.7%
Other Punctuation
ValueCountFrequency (%)
. 8386
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 127
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 25285
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 8386
33.2%
2 7737
30.6%
5 7737
30.6%
0 1298
 
5.1%
- 127
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25285
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 8386
33.2%
2 7737
30.6%
5 7737
30.6%
0 1298
 
5.1%
- 127
 
0.5%

Airport_fee
Categorical

High correlation  Imbalance  Missing 

Distinct3
Distinct (%)< 0.1%
Missing1614
Missing (%)16.1%
Memory size156.2 KiB
0.0
7718 
1.75
 
635
-1.75
 
33

Length

Max length5
Median length3
Mean length3.0835917
Min length3

Characters and Unicode

Total characters25859
Distinct characters6
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.75
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 7718
77.2%
1.75 635
 
6.3%
-1.75 33
 
0.3%
(Missing) 1614
 
16.1%

Length

2025-06-04T12:30:10.989190image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-06-04T12:30:11.643904image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0 7718
92.0%
1.75 668
 
8.0%

Most occurring characters

ValueCountFrequency (%)
0 15436
59.7%
. 8386
32.4%
1 668
 
2.6%
7 668
 
2.6%
5 668
 
2.6%
- 33
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 17440
67.4%
Other Punctuation 8386
32.4%
Dash Punctuation 33
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 15436
88.5%
1 668
 
3.8%
7 668
 
3.8%
5 668
 
3.8%
Other Punctuation
ValueCountFrequency (%)
. 8386
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 33
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 25859
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 15436
59.7%
. 8386
32.4%
1 668
 
2.6%
7 668
 
2.6%
5 668
 
2.6%
- 33
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25859
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 15436
59.7%
. 8386
32.4%
1 668
 
2.6%
7 668
 
2.6%
5 668
 
2.6%
- 33
 
0.1%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0.75
6521 
0.0
3462 
-0.75
 
17

Length

Max length5
Median length4
Mean length3.6555
Min length3

Characters and Unicode

Total characters36555
Distinct characters5
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.75
3rd row0.0
4th row0.0
5th row0.75

Common Values

ValueCountFrequency (%)
0.75 6521
65.2%
0.0 3462
34.6%
-0.75 17
 
0.2%

Length

2025-06-04T12:30:12.346429image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-06-04T12:30:12.823890image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0.75 6538
65.4%
0.0 3462
34.6%

Most occurring characters

ValueCountFrequency (%)
0 13462
36.8%
. 10000
27.4%
7 6538
17.9%
5 6538
17.9%
- 17
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 26538
72.6%
Other Punctuation 10000
 
27.4%
Dash Punctuation 17
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 13462
50.7%
7 6538
24.6%
5 6538
24.6%
Other Punctuation
ValueCountFrequency (%)
. 10000
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 17
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 36555
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 13462
36.8%
. 10000
27.4%
7 6538
17.9%
5 6538
17.9%
- 17
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36555
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 13462
36.8%
. 10000
27.4%
7 6538
17.9%
5 6538
17.9%
- 17
 
< 0.1%

Interactions

2025-06-04T12:29:39.994054image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:28:56.515008image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:01.663040image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:05.406077image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:10.069157image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:15.089745image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:19.934051image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:24.783989image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:29.733059image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:34.482412image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:40.396763image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:28:57.136075image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:02.118160image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:05.828790image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:10.568842image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:15.518770image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:20.563749image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:25.254118image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:30.213832image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:34.993666image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:40.804008image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:28:57.549162image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:02.376619image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:06.269087image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:11.099054image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:15.987607image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:21.105478image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:25.794048image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:30.623893image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:35.524123image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:41.316654image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:28:58.036993image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:02.599058image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:06.673707image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:11.528833image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:16.623789image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:21.556680image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:26.208969image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:31.155220image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:35.963059image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:41.827919image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:28:58.523919image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:02.817150image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:07.153781image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:12.151455image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:17.090616image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:21.953861image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:26.709051image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:31.549107image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:36.429037image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:42.463949image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:28:59.069904image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:03.038919image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:07.805037image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:12.604035image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:17.553696image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:22.410699image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:27.153670image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:31.963731image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:36.829645image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:42.895650image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:28:59.561146image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:03.263751image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:08.197411image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:13.113697image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:17.995565image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:22.823703image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:27.608056image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:32.422465image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:38.046484image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:43.428233image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:00.054271image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:03.894075image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:08.784012image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:13.663993image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:18.521578image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:23.349160image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:28.183786image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:32.914140image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:38.583999image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:43.823902image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:00.688087image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:04.453674image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:09.214138image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:14.120778image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:18.959071image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:23.828994image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:28.703772image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:33.517679image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:38.990616image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:44.306676image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:01.179117image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:04.906585image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:09.643095image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:14.557140image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:19.489084image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:24.223948image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:29.216396image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:34.047193image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-06-04T12:29:39.514145image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2025-06-04T12:30:13.288029image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Airport_feeDOLocationIDPULocationIDRatecodeIDVendorIDcbd_congestion_feecongestion_surchargeextrafare_amountimprovement_surchargemta_taxpassenger_countpayment_typestore_and_fwd_flagtip_amounttolls_amounttotal_amounttrip_distance
Airport_fee1.0000.0760.3660.0320.0350.1330.3310.4290.6050.3200.3100.0280.2020.0000.3740.5220.6401.000
DOLocationID0.0761.0000.086-0.0470.0130.1420.1340.012-0.0870.0500.062-0.0060.0590.0000.039-0.043-0.070-0.095
PULocationID0.3660.0861.000-0.1320.0170.1740.206-0.013-0.1250.0730.044-0.0100.0590.000-0.005-0.118-0.116-0.145
RatecodeID0.032-0.047-0.1321.0000.2200.1610.417-0.1350.3470.9410.0140.0500.0520.0000.0440.4710.3210.286
VendorID0.0350.0130.0170.2201.0000.0250.0650.5090.0270.1500.0450.1380.0630.0700.0190.0000.0330.000
cbd_congestion_fee0.1330.1420.1740.1610.0251.0000.3840.2020.1400.2730.2580.0270.1390.0280.0500.1090.1310.000
congestion_surcharge0.3310.1340.2060.4170.0650.3841.0000.3120.2940.6930.6570.0280.3690.0020.1130.2410.3921.000
extra0.4290.012-0.013-0.1350.5090.2020.3121.0000.0590.3110.305-0.0750.2440.1030.2990.1290.1890.048
fare_amount0.605-0.087-0.1250.3470.0270.1400.2940.0591.0000.3410.4060.0330.1640.0000.3500.3930.9570.798
improvement_surcharge0.3200.0500.0730.9410.1500.2730.6930.3110.3411.0000.6970.0440.4110.0000.0370.2180.4560.000
mta_tax0.3100.0620.0440.0140.0450.2580.6570.3050.4060.6971.0000.0580.4060.0000.1670.3590.5060.000
passenger_count0.028-0.006-0.0100.0500.1380.0270.028-0.0750.0330.0440.0581.0000.0310.0000.0030.0430.0300.031
payment_type0.2020.0590.0590.0520.0630.1390.3690.2440.1640.4110.4060.0311.0000.0190.1170.1020.1960.011
store_and_fwd_flag0.0000.0000.0000.0000.0700.0280.0020.1030.0000.0000.0000.0000.0191.0000.0000.0000.0001.000
tip_amount0.3740.039-0.0050.0440.0190.0500.1130.2990.3500.0370.1670.0030.1170.0001.0000.2200.5250.279
tolls_amount0.522-0.043-0.1180.4710.0000.1090.2410.1290.3930.2180.3590.0430.1020.0000.2201.0000.4020.368
total_amount0.640-0.070-0.1160.3210.0330.1310.3920.1890.9570.4560.5060.0300.1960.0000.5250.4021.0000.769
trip_distance1.000-0.095-0.1450.2860.0000.0001.0000.0480.7980.0000.0000.0310.0111.0000.2790.3680.7691.000

Missing values

2025-06-04T12:29:45.006034image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2025-06-04T12:29:46.563699image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-06-04T12:29:47.498476image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeAirport_feecbd_congestion_fee
56127712025-01-07 19:24:132025-01-07 20:11:511.018.801.0N13226172.34.250.50.000.01.078.050.01.750.00
104282012025-01-12 17:02:552025-01-12 17:17:341.01.901.0N107246113.53.250.53.650.01.021.902.50.000.75
2614622025-01-01 09:55:162025-01-01 09:59:511.01.171.0N1612344-7.90.00-0.50.000.0-1.0-11.90-2.50.000.00
234541122025-01-25 22:30:522025-01-25 22:36:141.00.771.0N26323617.21.000.52.440.01.014.642.50.000.00
238853912025-01-26 12:48:512025-01-26 13:09:401.02.801.0N229114112.83.250.53.500.01.021.052.50.000.75
295762722025-01-03 22:20:012025-01-03 22:32:12NaN1.01NaNNaN100161013.10.000.50.000.01.017.10NaNNaN0.00
163983812025-01-18 16:10:412025-01-18 16:24:351.01.601.0N264264213.53.250.50.000.01.018.252.50.000.75
253290912025-01-28 09:28:092025-01-28 09:33:221.00.401.0N23723715.82.500.52.000.01.011.802.50.000.00
39168022025-01-05 17:23:462025-01-05 17:30:011.01.401.0N2496818.60.000.55.000.01.018.352.50.000.75
190435522025-01-21 19:01:352025-01-21 19:11:461.01.391.0N230234110.72.500.53.590.01.021.542.50.000.75
VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeAirport_feecbd_congestion_fee
240829712025-01-26 16:45:212025-01-26 16:58:311.02.201.0N231186114.203.250.53.750.01.022.702.50.00.75
65155222025-01-08 18:18:262025-01-08 18:20:271.00.371.0N14126314.402.500.52.000.01.012.902.50.00.00
155337122025-01-17 18:30:012025-01-17 18:44:062.01.721.0N107161113.502.500.52.080.01.022.832.50.00.75
105442222025-01-12 19:39:352025-01-12 19:44:382.00.841.0N1137916.500.000.53.380.01.014.632.50.00.75
14516522025-01-02 20:53:272025-01-02 21:08:581.03.801.0N162262119.801.000.56.200.01.031.002.50.00.00
2039922025-01-01 04:23:332025-01-01 04:27:171.00.441.0N2304825.801.000.50.000.01.010.802.50.00.00
96666822025-01-11 20:01:422025-01-11 20:24:061.05.191.0N17087126.101.000.56.370.01.038.222.50.00.75
303446412025-01-11 20:24:352025-01-11 20:33:36NaN0.90NaNNaN140237011.920.000.50.000.01.015.92NaNNaN0.00
81337222025-01-10 12:06:052025-01-10 12:18:251.01.501.0N9068112.800.000.51.000.01.018.552.50.00.75
50115812025-01-07 09:01:072025-01-07 09:10:311.01.801.0N236142111.402.500.50.000.01.015.402.50.00.00